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Abstract 

c/3 I It is commonly believed that scale-free networks are robust to massive numbers of random 

^ ' node deletions. For example, Cohen et al. in (1) study scale-free networks including some 

which approximate the measured degree distribution of the Internet. Their results suggest 
that if each node in this network failed independently with probability 0.99, the remaining 
I network would continue to have a giant component. In this paper, we show that a large and 

important subclass of scale-free networks are not robust to massive numbers of random 
node deletions for practical purposes. In particular, we study finite scale-free networks 
which have minimum node degree of 1 and a power-law degree distribution beginning with 
• nodes of degree 1 (power-law networks). We show that, in a power-law network approx- 

. imating the Internet's reported distribution, when the probability of deletion of each node 

Q I is 0.5 only about 25% of the surviving nodes in the network remain connected in a giant 



> 



X 



component, and the giant component does not persist beyond a critical failure rate of 0.9. 
The new result is partially due to improved analytical accommodation of the large number 
of degree-0 nodes that result after node deletions. Our results apply to finite power-law 
I networks with a wide range of power-law exponents, including Internet-like networks. We 

give both analytical and empirical evidence that such networks are not generally robust to 
massive random node deletions. 

Key words: fault tolerance, scale-free networks, Internet resilience, distributed systems, 
graph algorithms 



1 Introduction 



Scale-free networks (SFNs) are massive networks whose node-degree distribution 
follows a power law in the tail of the distribution . Power-law networks (PLNs) 
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are the class of scale-free networks which have a minimum degree of 1 and which 
follow a power law beginning at degree 1. Many real- world networks — such as 
the Internet, the web graph, and many social networks — have been observed to be 
scale-free (0; S S li 8). Because of the prevalence of these networks, the rela- 
tionship between a network's degree distribution and its robustness to random node 
deletions has been studied, and the common belief is that scale-free networks are 
very robust to this kind of failure. The original work on this subject led to the 
claim that the Internet would retain a giant component even if 99% of its nodes 
were removed at random. 

To study this desirable property of scale-free networks, we modeled the effects 
of random failures on a graph's degree distribution. This revealed that power-law 
networks are not generally robust to random node failures. In the case of the widely- 
cited Internet resilience result (jll) of a power-law network with slope parameter 
/3 = 2.5 and minimum degree 1 (0; H), high failure rates lead to the orphaning 
of a large fraction of the surviving nodes, and to the complete disintegration of 
the giant component after 90% of the network has failed. This high critical failure 
rate may appear to suggest robustness, but as the failure rate increases, the giant 
component captures a diminishing fraction of the surviving nodes. For example, 
when P = 2.5, a PLN's giant component initially represents 60% of the network 
but comprises only 25% of the surviving network by the time half the network has 
failed. As P increases this decay becomes more dramatic, and the critical failure 
rate decreases. 

The main results of this paper are as follows. We estimate the surviving degree 
distribution of a PLN after random node deletions in order to capture our simu- 
lated results. The graph that remains after random node deletions is approximately 
a PLN, and its degree distribution can be conservatively estimated with similar pa- 
rameters. We show analytically how to derive these parameters from the initial PLN 
slope P and the failure rate p, and use these parameters to identify the critical failure 
rate for a PLN. Our empirical results validate and expand these analytical results by 
showing when simulated PLNs break down and how the giant component decays 
as a function of p. 

We observe that a large and important class of scale-free networks decays rapidly 
and has critical failure rates due to finite size effects, and conclude they are not 
resilient to massive random failures. In practice, dynamic failures are likely to be 
of more interest when considering a real network's resilience, but the simultaneous 
random failure model is also useful for studying the structure of subpopulations in 
a network. Our result is applicable to the study of distributed collaborative filters 
robust networks, and epidemiology. If real-world PLNs such as the Internet 
and disease pathway networks exhibit robustness, we do not believe it is because of 
their power-law distribution, and further explanations must be sought. More highly 
assortative networks with similar distributions but in which connectivity is biased 
in favor of connecting similar nodes (.10.) are worth further study. 
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2 Related Work 



A large body of work has been published on graphs and their properties, and our 
work builds upon the work in scale-free networks. The formal treatments are based 
in physics, statistical mechanics, computer science, and mathematics (Q; S S EH; 



12c Il3l : Il4t Il5b and describe mathematical properties of graphs, including those 
derived from the assumption that SFN node-degree distributions follow a power 
law in the tail. The empirical work (SSSSS) is aimed at capturing or sampling 
the degree distribution and other properties of real-world networks such as Internet 
routing, web pages, or social networks in order to determine whether the observed 
systems are scale free, and to compare observations to the theoretical properties. 
Many authors observe that Internet communities tend to form scale-free networks, 
although for the Internet itself this ernpirical work is based on traceroute sampling, 
which has been called into question (Il6h . 

Scale-free networks were originally of interest to us because of their published re- 
silience to random failures ( 1 ; 8), which implied that random subpopulations in 
a SFN had a good chance of being highly connected. Our interest in subpopula- 
tions lies in distributed multi-agent systems and distributed recommender systems, 
wherein it is desirable that disinterested parties not be required to process or for- 
ward messages il7i) . The spread of information and pathogens has also been 
studied, as have many other families of graphs (.3.:.18.: 19: 20.). 

It is important to be precise in comparing our work with the result in ([H). As part 
of extremal graph theory, scale-free and power-law network percolation thresholds 
are defined if they hold in the limit as the size of the network goes to infinity. That 
result holds in the limit for SFNs with small minimum degree, but does not hold 
for finite PLNs of minimum degree 1 (as many workers in the field have come to 
assume). We seek to correct the generalization, and here we analyze the effects of 
finite system size on the percolation threshold for this special case of SFN. 



3 Random Failures in Power-Law Networks 



We consider the class of scale-free networks with minimum degree 1 whose de- 
gree distributions begin following a power law at degree 1. We refer to these as 
power-law networks, or PLNs. PLNs have been analyzed in some detail (2: .21.) but 
we further wish to understand the properties of the subgraph which remains after 
random node deletions. Let the number of nodes of degree k in an initial graph G 
be c(G, k) = e'^k-^, with power-law slope /3 (0 < /? < /5o = 3.47875...), scale pa- 
rameter a, and minimum degree 1(2). For PLNs there is no giant component when 
/? > /3o (I2!). We denote the total number of vertices of degree k > 1 in G as \G\, and 
count degree-0 nodes separately as they appear. Given the parameters for G, and a 
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Fig. 1. The degree distribution of the remaining graph when a fraction p = 5%-95% of the 
nodes in a PLN have failed, for /? = 2.5 and a chosen such that the maximum degree was 
130. The graph follows a rough power law with an increase in slope and a growing roUoff. 

failure probability p, it is reasonable to ask whether the surviving graph G' is a PLN. 
If so, we wish to know its corresponding parameters a' and P', and whether G' has 
a giant component (i.e., a connected portion of the graph with 6(|G"|) nodes). If G' 
is a PLN (apart from orphaned nodes) with < Po then a giant component almost 
surely exists, so it suffices to show when this condition holds. Our derivations in- 
dicate critical failure rates (p such that /3' = /3o) only appear when P > 2, and we 
restrict ourselves to that case here. 

Our initial numerical and experimental work indicated that when G is a PLN, the 
number of surviving nodes in G' of degree k > 1 follow an approximate power- 
law degree distribution (Figure 1). In finite networks, these surviving distributions 
will exhibit a roUoff. The observed roUoffs in these distributions make the graph 
less robust — the roUoff is comparable to targeted attacks, which cause SFNs to col- 
lapse more quickly (t22:i23i). Also, the low-degree behavior of the surviving graph 
remains very consistent, suggesting the critical slope will not be affected (low- 
degree behavior such as raising the minimum degree etc. can dramatically increase 
Po or eliminate the critical slope entirely). Using a power law approximation with 
no roUoff and comparing the slope to the same critical Pq should thus give us a 
reasonable bound on robustness to random failures. 

The number of nodes \G\ in a PLN with /? > 1 is approximately Cl/^)^" (2), using 
the Riemann Zeta function ((t) = E^^^^. This gives the expected number of 
nodes of degree k > 1 in G and G'. For G' we will also account for orphaned nodes 
(nodes which have not failed but whose neighbors have all failed, leaving them with 
degree 0). For us this is crucial — orphans should not be considered faulty, as they 
are only isolated members of the subpopulation of interest. 
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Assuming G' is suitably approximated with a power-law, it remains to derive the 
parameters of G' and determine from (3 and p when P' < Pq. If failures occur with 
failure probability p in a graph with degree distribution c{G, k) = e"k~^ the new 
degree distribution (tightly bounded around its expectation) will be 

c(G',fc) = (i-p)x:p(^;)(i-p)v°-^ 

fcQ — k \ y 

Which for degree and 1 reduce to 



c{G', 0) = (1 - p)e"x, X = E 7^/°' and 

A;o=l '''O 

c{G\ 1) = (1 - p)e"e, e = i: A^o(i - p)p'''-' 

ko=l '^0 

(noting the definitions of x and Estimating the new distribution with a power 
law c{G', k) e"'/c~'^' gives c(G', 1) ^ e"' = (1 — p)e"^ and we directly obtain 
a' — a-\- ln((l — p)^). To find (3', note that of the original |G| nodes there are an 
expected p\G\ failed nodes, c(G", 0) orphaned nodes, and nodes captured by 
the size estimate of the new graph given the new parameters. For (3 > 1, 

\G\=p\G\ + |G"| +c(G",0) 
C(/5)e" = K(/5)e" + C(/50e"' + (1 -P)e"x 
C(/?)e" - pC(/3)e" + C(/3') (1 - P)e"e + (1 - p)e"x 

and solving for 13' gives 

(1 -p)C(/3)e" = C(/3')(l -P)e'^e + (1 -P)e"x 
C(/3) = C(« + X 

Numerical estimation shows that /?' > /?, and this difference varies with p. Figures 2 
and 3 show that for /? > 2 there are critical failure rates p^ for which (3' = /?o, 
beyond which the surviving graph will not have a giant component. This shows 
that power-law networks are not generally robust to random node failures. How- 
ever, our result depends upon a potentially coarse approximation and (although we 
have noted this approximation should certainly result in an upper bound on pj we 
would like an indication of how accurate our bounds are, and some validation of 
our methodology. The next section will present additional evidence gathered by 
observing failures in simulated graphs. 
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Fig. 2. The function l3'{f3,p) for 1.1 < /3 < 3.5 and < p < 99%. For /3' > /3o (intersect- 
ing plane), no giant component exists in G'. An Internet-like PLN with /? = 2.5 will cease 
to have a giant component when p = 89.8% (point emphasized). 




Fig. 3. The critical failure rate p for which P'{(3,p) = /3q, for 2 < /J < Pq. The critical 
points for the curves of /3 = 2.5 (p = 89.8%), 3.0 {p = 58.0%), and 3.3 (p = 25.4%) are 
emphasized for their correspondence with empirically studied real networks 

4 Simulation Methods and Results 



Using c{G, k) = e°'k~'^ we generated node-degree histograms matching a power 
law, and recorded (P, a) pairs that produced histograms averaging one million 
nodes. The histograms were used to populate an array of vertex-numbered "edge 
stubs," the stubs were permuted randomly to create a random configuration (24i) . 
and pairs of stubs were added as edges in a multigraph. 

The random configuration produces multigraphs which match a node degree distri- 
bution, but it is reasonable to wonder if redundant arcs and self-arcs are frequent 
enough to conflict with the assumptions of independence implicit in our derivations. 
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Fig. 4. Simulation results for (3 = 2.5, and a chosen such that n 10^. Initial giant 
component size and critical failure rate vary with f3 but this behavior is representative of 
2 < (3 < Pq. Note the decreasing fractional size of the giant component O, and the increas- 
ing population of orphans □ and nodes in small components in general A. 

From ^ we estimated their likelihood given the number of edges in the graph and 
the highest expected degree, and established that they are infrequent. For P > 2 
the likelihood of an individual edge of the highest-degree node being a self-arc is 

,a-i 

2/C{(3 — l)e 1^ , which approaches zero in large PLNs. 

We used an iterated 3-coloring algorithm to identify the giant and secondary com- 
ponents in G. To simulate failures, nodes were pre-colored with probability p and 
the algorithm run again to collect the components of G' . For each (3 and p twenty 
independent graphs were created, with mean and standard deviations collected for 
several statistics. 

For 2 < /? < 3.5 and < p < 0.99 we computed the size of the first and second 
largest components, the number of surviving nodes outside the largest component, 
and the number of orphans. The range was chosen to include 2, a transition point 
of interest in regards to the density of edges in a PLN (0), and 3.5, to demonstrate 
random failures in a graph with no giant component. For (3 = 2.b the data can be 
seen in Figure 4. 

For those familiar with the literature, the initial size of the giant component in 
Figure 4 may come as a surprise. For (3 < 2 the fraction of the graph in the giant 
component is indirectly given in (12 lb and it is known to comprise virtually the 
entire graph. The probability that a random PLN node is in the giant component is 



e exp[ 



(2-/3) 



a\ 



for 



approximately 1 — for /? = 2 and approximately 1 ^ yy^i>Y {3-/3)13 
(3 < 2, both of which approach 1 in the limit. ^ For 2 < (3 < PqWO such equation 
has been published, but in simulations the fractional size of the giant component 
decreases prior to its complete dissolution at [3q as shown in Figure 5, subject to 
some scale effects. We have not seen this published elsewhere and the result was 
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Fig. 5. The fraction of nodes in the giant component (in simulation) for 2 < /3 < /5o, 
when n 10^. These numbers varied with q; for /? = 2 the fraction of nodes in the giant 
component approaches 1 in the hmit (EH) , but for practical purposes (and possibly in the 
limit) the giant component for /3 > 2 can be a small fraction of the graph. 

somewhat unexpected, so we include it here. 

Figure 6 presents the central result of our simulations against our mathematical 
predictions. The solid lines depict \G'\/\G\ (i.e. (1 - p)C{P')e"'^/C{P)e°' for se- 
lect values of P — the difference between these curves and the ideal (the diagonal, 
(1 — p)!^!) is the number of orphaned nodes. These curves are terminated and 
vertical lines are drawn at the point when /?' = Pq. Over these curves are plot- 
ted pointwise observations of these quantities in simulations of networks with 10^ 
nodes. The match is virtually exact, as one might expect — the combinatorics of the 
predictions is simply being exercised stochastically in the simulation. Finally and 
most importantly. Figure 6 plots the decreasing size of the giant component in the 
graph for comparison with the vertically distinguished critical failure rates. In this 
case the simulation is being compared to our approximation, and we see that the 
giant component falls away to virtually nothing as the failure rate approaches the 
predicted critical point. 

We conclude with a graph emphasizing the decay of the giant component itself. 
Figure 7. While P' < Pq some constant-order (although potentially small) fraction 
of the edge-bearing nodes in G' will almost surely remain in a giant component. Let 
m be the size of the giant component in G and m' its size in G'. Then ideally m' = 
{l—p)m, but this is clearly not the case. This graph also magnifies the disintegration 
of the giant component shown in Figure 6, particularly in the extreme case of P = 
3.3. In this case, the giant component is a small (but constant-order) fraction of the 
graph to begin with, and as the graph decays it is subject to greater uncertainty in 
its fate (this curve was the only one with a substantive standard deviation), so that 
it is not clear that it decays. For this case, we include the average size of the second 
largest component 0, divided by m (this value is graphically indistinguishable from 
zero in the other three cases). Through comparison of m' /m with 0/m it appears 
that the giant component has lost its status by the predicted critical point. 
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Fig. 6. Predicted and actual unorphaned survivors, predicted critical failure rates, and giant 
component size, for /? = 2.0, 2.5, 3.0, and 3.3, as a function of p. The predicted number 
of unorphaned survivors, (1 — p)C(/?')^"?/C(/5)e", closely matches observed unorphaned 
survivors (as expected of a tightly bounded combinatorial result). More importantly, the 
giant component sizes (O □ A) fall away close to the predicted critical failure rates (vertical 
drop). Giant component sizes for j3 = 3.0 and 3.3 are truncated for clarity at 75% and 50%, 
respectively. 
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Fig. 7. The ratio of the size of the giant component remaining after failure to its size before 
failure, for = 2.0, 2.5, 3.0, and 3.3. For f3 = 3.3 the standard deviation is shown. 
For /3 = 3.3 the giant component is a fairly small fraction of the graph, and the giant 
component's disintegration is shown by observing the fractional size of the second largest 
component falls within this range by the predicted critical failure rate of 25.4%. For j3 = 3.0 
and 3.3 the plots are truncated for clarity at 75% and 50%, respectively. 
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5 Conclusions and Future Work 



It is probably clear to any network administrator that static graph theory has little 
to say about the resilience of the Internet to random failures over time: the Internet 
is a massive computer network with people responsible for repairing faults as they 
occur. Nevertheless, the random simultaneous failure model is appropriate when 
reasoning about the connectivity of randomly distributed subpopulations in the net- 
work. We believe we have shed some light on the subtleties of a result that has been 
applied too generally. By explicitly considering orphans in the failure process of a 
PLN, and considering graphs of minimum degree 1, we have refined the Internet 
resilience result of ( 1 ) for finite networks. Thus we have better estimated when such 
a network would completely disintegrate as a function of its initial slope parameter 
and failure rate. In particular we have what we believe to be a more accurate critical 
failure rate for the Internet, and we show that this is not as resilient as originally 
suggested. 

Because we are citing it so extensively, it is worth stating two additional observa- 
tions in (1), to avoid confusion. Cohen et al. make casual reference to the break- 
down of the SFN giant component under moderate failure levels when /3 = 3.5. 
Although reasonable in general, in pure power-law networks there is no giant com- 
ponent when /3 > Po = 3.47875 Also confusing may be their graph of the 
fractional size of the giant component remaining as a function of the failure rate p, 
when P = 2.5 (a graph analogous to our Figure 7). This ratio is graphed in such 
a way that it appears to very closely follow, and even exceed, (1 — p)n — in other 
words, the surviving component's size appears to exceed the expected number of 
survivors. In fact the figure graphs this quantity divided by (1 — p), ^ and thus in 
fact matches our results in Figure 7 for the values of P they present (2.5 and 3.5). 
This graph has been reproduced in several places without elaboration Qi). 

We have analytically considered these matters for finite PLNs under the full range 
< P < Po, but have not yet confirmed our results in simulation for P < 2. Ob- 
serving how our results vary for networks over several orders of magnitude larger 
and smaller also remains to be done. Doing so will require a more substantial sim- 
ulation than we have implemented. Beyond pure power-law degree distributions, a 
more general class of PLNs with roUoff and offset terms should also be studied. In 
particular, most real world networks that approximate a power law exhibit a roUoff. 
Finally, for 2 < P < Pq we have been unable to find a derivation of the fractional 
size of the giant component of the graph in the limit, and such a formula would 
be of interest. Extending the model to consider assortative networks, conditional 
failure models, and other variations that affect the critical failure threshold could 
lead to a number of interesting new results. 

^ This discrepancy between figure and text has been confirmed by the authors in 
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